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ABSTRACT 

The purpose of this research was to deteraxne the 
relative roles of experience/learning and selected visual aptitude 
factors on the ability to detect and identify indications of defects 
in X-ray fil» of welds and other aaterials. Penetraneter Detection 
and Defect Identification Tests were developed to aeasure the ability 
of radiographic fill inspectors to detect and identify weld defects. 
These tests and the Ortho-rater exaiination were given to Havy 
certified fill inspectors. Test results and visual exaiination 
results were coipared to deteriine the relationship between vision 
and fill reading skills. Both fill tests were readiinistered six 
lonths later to deteriine fill inspector reliability. Mo significant 
relationship was found to exist between the selected visual aptitude 
factors and fill reading ability. Low levels of inter- and 
intrasubject reliability were found to exist on both the detection 
and identification tests, and a significant intrasubject relationship 
was found between identification test reliability and experience. 
This suggests that learning plays an important role in the 
acquisition of fill reading skills. Further research in new training 
lethods is recouended based on the above findings. tiutbor/HM) 
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SUMMARY AND CONCLUSIONS 



Problem 

The purpose of this research was to determine the relative roles of 
experience /learning and selected visual aptitude factors on the ability 
to detect and identify Indications of defects in X-ray film of welds 
and other materials. 

Background and Requirements 

Today's Navy continues to adopt sophisticated systems whose components 
are fabricated from exotic materials, are subjected to greater stresses, 
and require extremely thorough radiographic testing (RT) to insure their 
safe and reliable operation. To meet this requirement, large numbers 
of film readers (RT inspectors) are needed. The current training pro- 
gram cannot accommodate this need because the RT training technology has 
not kept pace with the hardware technology. 

Approach 

TWO tests were developed to measure the ability of film readers to 
detect and identify welding defects as shown on X-ray film. These 
tests and the Ortho-Rater examination were given to Navy certified film 
inspectors. Results of the film tests were compared to the visual 
examination results to determine the relationship between vision and 
film reading sklxls. Both film tests were readministered six months 
later to determine if experienced film readers were more reliable than 
less experienced film readers. 

Findings. Conclusions, Recommendations 

1. No significant relationship was found to exist between the 
selected visual aptitude factors measured and film reading ability. 
The problem of providing more RT inspectors cannot be solved by 
imposing more stringent visual selection factors on trainees. (Page 11) 

2. Low levels of inter-rater agreement were found to exist on both 
the detection and identification tests; a significant intra- 
subject relationship was found between identification reliability 
and experience. This suggests that learning plays an important 
role In the development of film reading skills. (Pages 11 and 12) 

3. Based on the above findings, it was recommended that research be 
conducted to determine optimum learning strategies in the dimen- 
sions underlying film reading skills and based on that research, 
a new RT inspector training program be developed to incorporate 
the findings of that research. Specific areas of study should 
Include : 



lil 



The perception of subtle changes In shades of gray In a 
darkened black/white environment. 

The detection of shapes with poorly defined outlines 
caused by fuzzlness of radiographic film. 

The conversion of brightness and contrast on the 
radiographic film to material density. 

The conversion of three dimensional effect of solid 
form on a flat projection. (Page 12) 
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RELATIVE ROLES OF LEARNING AND VISUAL FACTORS ON 
RADIOGRAPHIC INSPECTOR PERFORMANCE 



A. Introduction 



This is the lirst research effort by the Navul Personnel and Training,^ 
Research Laboratory (WPTRL) concerned with determining optimum learning 
strategies for reading X-ray film of welds and other materials. ■'■ Research 
iv.to the basic dimensions underlying film reading skills and abilities is 
required because of a lack of applicable knowledge and a new requirenient 
for large numbers of certified radiographic testing (RT) inspectors (film 
readers) to fill current fleet shortages. This requirement results from 
the Navy's increased ador>tion of sophisticated systems whose components 
are: (1) fabricated from* exotic materials; (2) exposed to greater stresses; 
and (3) thus require extremely thorough testing to insure their safe and 
reliable operation. Currently, only RT inspection provides this level of 
assurance. 

To determine whether welds are safe and reliable, radiographs are taken 
using both X- and gamma ray sources. Defects hidden within the interior of 
the welds appear on the exposed radiographic film as minute, subtle changes 
in shades of gray. The task of the RT inspector (film reafer) is to first 
detect these minute, subtle changes of gray and then identify which of over 
15 types of defects they may be. These defects may appear differently 
depending on the location in the weld and within the weld configuration 
itself (e.g., pipe, plate, casting). 

To further compound the task, the RT inspector must be able to identify 
several types of film and processing artifacts that can appear as weld 
defects or mask indications of defects. Additionally, he must be able to 
overcome the confusing influences of fuzziness in shadow picturing of the 
X-ray process, the photographic representation of density in achromatic 
black and white, and the third-dimensional effects of solid form on a flat 
projection. It is generally conceded by those in the field that this task 
is most complex and that little inter-inspector agreement exists tod-'v. 

In RT inspector training today, the student is given radiographs, con- 
taining examples of defects, to review. When and if, the level of the 
student's judgments of defects approximates that of his instructor, he is 



Henceforth the term weld will be used to include other materials. 



certified as an RT inspector; otherwise, and this is what usually happens^ 
he fails. This system of training is similar to the system used to train 
apprentices in the medieval guilds. In that system, a student cpprenticed 
himself to a master until he could successfully mimic the skills and judg- 
ments of the master • At that time, he was adjudged competent. Under such 
a system, the apprentice typically duplicated not only the master's skill, 
wisdom, and knowledge but, also, his master's biases, misconceptions, and 
foibles. Fundamental concepts of learning (e.g., feedback, reinforcement, 
structuring of material, provisions for individual differences) were not 
used in a systematic manner to maximize learning « 

This system is used to train RT inspectors because radiographs are one 
of a kind — they represent pictures of actual ship repairs. No method is 
currently available to reproduce industrial X--rays with 100% fidelity; 
therefore, training programs vary everywhere depending on the radiographs 
available. The X-rays used for training also change frequently because 
radiographs damage so easily. For example, the slightest mark or scratch 
on them appears as a defect and then it is not possible to interpret the 
X-ray correctly. 

The present research program was undertaken to provide insights into 
the dimensions underlying film reading skills and to point the way for 
development of RT inspector training. As a starting point, the specific 
purpose of this study was to determine the relative roles of experience/ 
learning and selected visual aptitude factors on the ability to detect and 
identify indications of defects on X-ray film of welds and other materials. 



B . Me t hodo lo^y 

1. Design 

A penetrameter detection test and a defect identification test were 
constructed to measure the ability of film readers to detect and identify' 
defects. These tests, and the Ortho-Rater visual examination, were given 
to certified film inspectors. Results of the film tests were compared 
to the visual examination results to determine the relationship between 
vision and film reading skills. Approximately six months after their 
original administration, both film reading tests were readministered to 
determine if experienced film readers were more reliable than less 
experienced film readers, 

a. Penetrameter Detection Test (Penny Test) , The penny test was 
designed to measure the ability of film readers to detect minute, subtle 
changes of shades of gray on radiographic film. 

A penetrameter (penny) is a device used by radiographers to 
demonstrate the quality of a radiograph. It is a small thin piece of 
metal with three holes of different size and is placed on the part to 



be RT inspected* The penetrating radiation passes through the three holes 
and makes small dark images on the film. The quality of the film is then 
determined by the detection of the peaetrameter outline and the dark spots. 

For the penny test, 100 radiographs of welds of varying thicknesses, 
materials, and RT processes were selected. The penny images from these 
radiographs were then cut out and mounted on cards for ease of administra- 
tion, control, and to assure handling without damaging the film, 

b. Defect Identification Test , The defect test measured the ability 
of film readers to identify indications of defects on X-ray film. For 
this test, 96 radiographs containing representative samples of the various 
defects in various configurations were selected. These radiographs were 
specifically selected to minimize the detection skills while focusing on 
the subject's identification skills. This was accomplished by selecting 
radiographs with essentially singular prominent defects. The defect areas 
of the radiographs were then cut out and mounted similar to penny test 
images • This procedure also reduced the area to be viewed to about 1/20 
of an actual radiograph. 

c. Ortho-Rater lest . To determine visual aptitude factors related to 
reading X-ray film, the following types of experts were consulted: radi- 
ologists, ophthalmologists, and human factors research personnel at the 
Naval Electronics Laboratory Center (NELC). Based on these discussions, 
the Hausch & Lomb Ortho-Rater was selected to measure the visual aptitudes. 
The Ortho-Rater measures near and far acuity, near and far phoria, depth 
perception, and color perception. 

2. Subjects 

The subjects were 12 certified film readers attached to the NDT School 
in February 1972. The NEC codes certifying the subjects as film readers 
are presented in Table 1. The mean years of film reading experience was 
four years and ranged from six months to ten years. 

3* Apparatus 

Standard X-ray film viewers in the Film Reading Room at the NDT School 
were used to illuminate the radiographs on both tests. The standard 
Ortho-Rater and accompanying software wr ^ used to administer the 
Ortho--Rater vision test. 

4. Procedure 

All subjects were given an eye examination, the penny test, and then 
the defect test. 
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TABLE 1 

SiJimnary of Subjects Film Reading Certifications 



NEC CODE 



TITLE 



NUMBER 



4935 



Nuclear Inspector 



9 



4936 



Nonnuclear Inspector 



3 



4938 



Nuclear Examiner 



5 



4939 



Nonnuclear Examiner 



4 



Note: Some subjects have multiple certifications. 



In the penny test, the subjects were instructed to report when they saw 
the small dark image that corresponded to one of the three holes in the 
penny. 

For the defect test, the subjects were instructed to identify all 
defects within each film chip that they would consider in the acceptance 
or rejection of a weld. They were asked to be as specific and as accurate 
as possible. 

Test scoring was complicated because it was not possible to develop 
an absolute scoring key. The only way to validate whether a film reaH-r 
correctly reads a film is to compare his analysis with the specimen Itself. 
This means the actual specimen must be cross-sectioned, polished, and acid 
etched. For this study, over 1,000 radiographs of actual shipboard repairs 
were reviewed and, obviously, it is not feasible to rip apart actual 
shipboard repairs. 

An alternate method of designing the test would have been to weld 
specific defects Into specimens, radiograph them, and then cross-section 
them. This method was not used because it would have taken months and been 
extremely expensive. Thus, when viewing each chip in the tests, there was 
no way to make an absolute right-wrong determination of whether a dark 
image was present on a penny or if a shadow on a piece of X-ray film was, 
in fact, a specific defect. Review of statistical tests revealed little 
in the way of meaningful analysis for these types of data. 
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After much investigation, level of relative agreement was selected as 
the basic measure for both tests. This measure provided a method of anal- 
ysis with sufficient power to answer the questions ret forth in the 
problem statement. 

a. Penny Test Scoring . The penny test was scored to determine inter- 
rater agreement and intrarater reliability. For the former, each item was 
individually analyzed as follows. If on a single item or film chip all 
subjects reported they saw (or did not see) the required dark image, the 
item received a score of "0". If all subjects out one reported seeing 
(or not seeing) the image, the item received a score of "1". The scoring 
progressed in this manner until a maximum score of "6" was reached, this 
score indicating that six subjects responded "yes" and six "no". 

b. Defect Test Scoring . The defect test was scored for in terra ter 
identification agreement and intrarater reliability. For interrater 
agreement, both the number and kind of defects were tallied for each 
frame. Because this data was basically non-ipsitive in nature, it was 
necessary to consider both how much a subject agreed with himself and how 
much he disagreed with himself. Additionally, the measure had to allow 
for differences in the number of defects each subject saw per frame. An 
agreement/disagreement ratio (A/D) was derived from the data to meet 
these criteria. 

Table 2 contains the scoring strategy for the A/D ratio. Each subject's 
responses on both administrations were compared in this manner. For ex- 
ample. Table 2 indicates that the subject saw only porosity both times he 
read Frame 1. Thus, he got an agreement score of "1" and a disagreement 
score of "0". In Frame 2, the subject saw IM both times for an agreement 
score of "1" but, because he also saw BT in August, he got a disagreement 
score of "1". In Frame 3, the identification of BT and RO on both tests 
yielded an agreement score of "2", and the single MT recponse counted to 
a "1" in the disagreement column. Finally, in Irarae 4, the subject saw 
three different defects over both administrations for an agreement score 
of "0" and a disagreement score of "3". 

Each reader's 96 responses were scored in this manner. First, the 
agreements and disagreements were scored for each subject. Then, the 
total agreements were divided by the total disagreements to obtain the 
A-D ratio for each subject. A/D ratios greater than 1.00 meant that a 
subject tended to agree with himself more than disagree. 



ERIC 



5 



TABLE 2 



Example of Subject's Defect Test Response 
and Scoring for A/D Ratio 



FRAME NO. 


SUBJECT RESPONSE 


SCORING 




1st Admin. 2nd Admin. 


AKree Disagree 




(Feb; (Aug; 


1 


P P 


1 0 


2 


IM IM-BT 


1 1 


3 


MT-BT-RO BT-RO 


2 1 


4 


P CP-MT 


0 3 


^ey to defect abbreviations for Table 2: 


— ^ — 1 — . . ■ - » 


BT - 


Burn Through 




CP - 


Crater Pit 




ER - 


Excessive Reinforcement 




FM - 


Foreign Material 




IM - 


Incomplete Insert Melt 




m - 


Melt Through 




P - 


Porosity 




RO - 


Root Oxidation 




T - 


Tungsten Inclusion 






C. Results 




1. Penny Test 






Figure 1 and Table 3 list the results of the 


penny test. 
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RATER 






AGREEMENT 


NUMBER OF ITEMS 


ATTAINED 






Complete 






Agreement 






One 




24 


Disagreement 







xwo 

Disagreements 



Three 
Disagreements 



Four 
Disagreements 



44 



14 



10 



Five 
Disagreements 



Six 

Disagreements 



Figure 1. Interrater Agreement for Penny Test. 



From Figure 1 It can be seen that Interrater agreement was less than 
satisfactory. For example, it can be seen that total agreement was reached 
on just 44 items; or on 24 items, one rater* s judgment differed from all 
the others. 



TABLE 3 




Penny Test Reliability Results 




Variables 


Penny Test 
r 


Intrarater Reliability 


.85* 


Intrarater Reliability X Experience 


.16 


Intrarater Reliability X Vision 


.13 


Test Reliability (Split Half) 


.94* 



*Significant at the .01 level or less. 

7 

erJc 



From Table 3, it can be seen that intrarater reliability was high 
(r « ,85) but that penny detection performance was unrelated to both 
experience (r - .16) and vision (r « .13). Further analysis of intra- 
rater reliability data revealed that the mean number of items on which a 
rater disagreed with himself was 15.8 with a range of 7 to 48 disagreements. 
It may be noted here that prior to the second administration of the penny 
test, the subjects estimated mean number of disagreements was 7.0. Total 
agreement across both administrations for all raters was reached on only 16 
items over both administrations of the penny test. 

2. Defect Test 

Defect test interrater agreement also was less than satisfactory. Sub- 
jects detected a mean of 1.1 defects per frame but, across subjects, that 
one defect was identified as 5^ different defects. The range of different 
names that the 1.1 defect per frame was called varied from 2 to 9 and there 
was not one frame that everyone agreed upon. 

Table 4 presents two typical examples of responses to defect test film 
chips. 



8 



TABLE 4 

Two Examples of Responses on Individual Frames of Defect Test 



FRAME A FRAME B 

Defect f Defect f 



No Defects 1 No Defects 

RO 5 FM 

MT 4 T 

p 2 CP 

BT 2 P 

IM 1 

ER 1 

T 1 IM 



Note: Raters may report more than one defect. 

Key to defect abbreviations for Table 4: 

BT - Bum Through 

CP - Crater Pit 

ER - Excessive Reinforcement 

FM - Foreign Material 

IM - Incomplete Insert Melt 

MT - Melt Through 

P - Porosity 

RO - Root Oxidation 

T - Tungsten Inclusion 
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The wide range of identified defects in the above table is especially 
evident in Frame B where four subjects saw no defects and the remaining 
seven each saw a different defect. 



TABLE 5 



Defect Test Reliability Results 



Variable 



Relationship 



Intrarater Reliability 



Mean AD Ratio « .93 



Intrarater Reliability X Experience 



r - .76* 



Intrarater Reliability X Vision 



r « .05 



Test Reliability (Split Half) 



r = .94* 



*Signiflcant at the .01 level or less. 



Intrarater reliability again was based on each subject's A/D 
(agreement /disagreement) ratio. The mean A/D ratio was .93 with a range 
of .54 to 1.50. In addition, the mean number of Items with complete 
agreement within raters was 36.6 with a range of 22 to 47. 

From Table 5, it can also be seen that Intrarater reliability was 
related to experience (r « .76) and unrelated to vision (r » .05). It 
should be noted that all subjects* vision was within the range required 
for tasks demanding high visual acuity as defined .by Bausch & Lomb. 
Several subjects had near perfect scores on acuity tests for both near 
and far vision. All subjects scored well on the other visual aptitude 
factors tested. 
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D. Discussion and Conclusions 



The low levels of Interrater agreement on both the detection (penny) 
test and identification (defect) test substantiates what Is generally 
conceded by field personnel, that is, that little interlnspector agreement 
exists today. 

The detection problem Is even more confounding than the penny test 
results indicate. Detection generally requires that man first employ 
"search" strategies and often the configuration of the Item searched for 
Is unknovm. To detect the penny holes on the X-ray film, the subjects 
knew the exact configuration of what they were trying to detect and pre- 
cisely where to look« Yet, all subjects agreed on just 44% of the penny 
"holes" in a task requiring 100% agreement. 

The Identification problem Is equally as serious as the detection 
problem. Not only was the interrater agreement low on the defect test, but 
the film inspectors did not even totally agree on one frame. In most 
cases, all subjects did see an Indication of a defect on each film but 
substantially disagreed as to Its Identification. Here again, the testing 
problem was simplified by presenting the subject with a2very small area to 
view, about l/20th of the area of an average radiograph , and Instructing 
him to report the defects he saw. Thus, he may have been Influenced to 
identify a defect only because a set had been established for him to 
identify a defect. The exact determination of the type of defect Is very 
Important because certain defects are acceptable in certain situations 
and other defects are not. 

Logically it appears that the film reading problem could be caused by 
deficient visual abilities. Inefficient learning, requiring man to do an 
Impossible task, or possibly some mix of the above. Concerning vision, 
the lack of a significant relationship between vision and film reading 
ability (detection and identification of Indications of defects) suggests 
that the answer to the problem of providing the fleet with critically 
needed film Inspectors cannot be solved by merely Imposing more stringent 
visual selection factors on prospective film inspection trainees. The 
fact that all subjects scored so well on the visual tests indicates that 
the present visual selection procedure is more than adequate. 

The significant positive relationship between Intrarater reliability 
and experience (r « .76 p < .01) on the defect test Indicates that within 
themselves the raters have learned to be more consistent, that is, that 
film reading is in the realm of man's capabilities. The film readers may 
not agree with others as to what a defect should be called, but from one 



In a pilot study, the full radiograph wan used with comparable results. 
Because reading entire radiographs was sc time consuming, the small 
chip format was adopted. 
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time to the next, they are more likely to call a defect by the same name. 
The problem then may be reduced to providing the proper environment which 
allows both student and certified film inspector to become more consistent 
with experience but, most importantly, experience that will be based on 
learning experiences designed to maximize agreement on both detection and 
identification of defects. 

E. Recommendations 

Based on the results of this research, it is recommended that: 

1. Research be conducted to determine optimum learning strategies in the 
dimensions underlying film reading skills. Specific areas of study 
should include: 

a. The perception of subtle changes in shades of gray in a darkened 
black/ white environment. 

b. The detection of shapes with poorly defined outlines caused by the 
confusing influences of fuzziness of radiographic film. 

c. The conversion of brightness and contrast on the radiographic 
film to material density or thickness. 

d. The conversion of three dimensional effects of solid form on a 
flat projection. 

2. Research be conducted to reproduce radiographs with 100% fidelity 
to be used for training purposes. 

3. A new film reader training program be developed to incorporate the 
findings of the above research, that will maximize the learning of film 
reading skills. 
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The purpose of this research was to determine the relative roles of experience/ 
learning and selected visual aptitude factors on the ability to detect and identify 
indications of defects in X-ray film of welds and other materialSa 

Penetrameter Detection and Defect Identification Tests were developed to measure 
the ability of radiographic film inspectors to detect and identify weld defects* 
These tests and the Ortho->rater examination were given to Navy certified film inspec- 
tors. Test results and visual examination results were compared to determine the 
relationship between vision and film reading skills* Both film tests were readmin*- 
istered six months later to determine film inspector reliability* 

No significant relationship was found to exist between Che selected visual 
aptitude factors and film reading ability* Low levels of inter and intra-subject 
reliability were found to exist on both the detection and identification tests, and 
a significant intra-subject relationship was found between identification test reli- 
ability and experience* This suggests that learning plays an important role in the 
acquisition of film reading skills* 

Based on the above findings it was recommended that research be conducted to 
determine optimum learning strategies in the dimensions underlying film reading 
skills and that the results from that research be used to develop a new radiographic 
inspector training program. 
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